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Abstract 

We consider the problem of testing significance of predictors in multivariate nonpara- 
metric quantile regression. A stochastic process is proposed, which is based on a com- 
parison of the responses with a nonparametric quantile regression estimate under the null 
hypothesis. It is demonstrated that under the null hypothesis this process converges weakly 
to a centered Gaussian process and the asymptotic properties of the test under fixed and 
local alternatives are also discussed. In particular we show, that - in contrast to the non- 
parametric approach based on estimation of L 2 -distances - the new test is able to detect 
local alternatives which converge to the null hypothesis with any rate a n — > such that 
an\fn — > oo (here n denotes the sample size). We also present a small simulation study 
illustrating the finite sample properties of a bootstrap version of the the corresponding 
Kolmogorov-Smirnov test. 
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1 Introduction 

Nonparametric regression methods have become very popular in the last decades because of 
the fact that employing a mis-specified parametric model will typically result in inconsistent 
estimates and as a consequence invalid statistical inference. In recent years many authors 
have developed nonparametric quantile regression estimates, which provide an attractive sup- 
plement to least squares methods by focussing on the estimation of the conditional quantiles 
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instead of the mean function [see Chaudhuri Jl99lh |Yu and Jonesl ffl^JYuand Jonesl Jl998h . 
Dette and Volgushevl (l2008h . Ichernozhukov et~aD told ) or iBondell et al.l fcoioh among many 
others]. These references mainly discuss the case of a one dimensional predictor, but from a 
theoretical point of view the methods can easily be generalized to multivariate predictors. On 
the other hand it is well known that in practical applications such nonparametric methods suf- 
fer from the curse of dimensionality and therefore do not yield precise estimates of conditional 
quantile surfaces for realistic sample sizes. In such cases a natural and very important question 
is which predictor variables are significant. 

The problem of testing signi ficance has found considerable interest in mul tivariate mean r egres- 



sion models. iGozalol (119931 ) considered conditional moment tests, while lYatchewl (119921) con- 



structed a test based on semi-parametric least-squares residuals. Lavergne and Vuongl ( 1996 ) 
suggested a directional testing procedure for discriminating bet ween two sets o f regressors with- 
out specifying the functional form of the mean regression, and iRacinel (119971 ) proposed a test 
based o n nonparametric estimates of the partial derivatives of the conditional mean of the re- 



sponse. lLavergne and Vuongi (120001 ) use d the kernel method to develop a test for the significance 
of a subset of explanatory variables and lDelgado and Gonzalez-Manteigal ( 120011 ) proposed a test 
which is based on functionals of a [/-process. 

Because of the well known robustness properties of the conditional quantile and the fact that 
conditional quantiles characterize the entire distribution it is of particular interest to develop 
methods for testing significance of predictors in quantile regression models. Surprisingly, in 
quantile regression this problem has found much less attention. Variable selection in the frame - 
work of linea r quan tile re gression models has been recently considered bv lZou and Yuanl (12008') . 
Wu and Liul ( 120091 ) and iBelloni and Chernozhukovl (120111 ) among others. iJeong et al.l (120121 ) 
proposed a test for significance in a multivariate quant ile regres s ion m odel. The work of these 
author s was motiva ted by Granger quantile causality (Granger! (Il969l )] and they employed an 
idea of IZhengl (119981 ). who proposed to transform quantile restrictions to mean restrictions. The 
corresponding test is based on a [/-statistic, which estimates the distance measure 



A = E[(P(Y < q T (X)\X, Z) - r) 2 f z (Z)}, 



where Y denotes the response, (X, Z) is the predictor, fz the density of Z and q T (X) the 
conditional r-quantile of Y given X. Note that the quantity A vanishes if and only if the 
conditional quantile of Y given X and Z does not depend on Z. A major drawback of this 
approach lies in the fact that non-parametric smoothing over both X and Z is needed for the 
construction of the estimate. This implies that the test is of very limited use when the dimension 
of (X, Z) is larger than 3. Moreover, this test can only detect local alternatives converging to 
the null hypothesis H : A = at a rate n~ 1 / 2 h~ ( - d+q ^ 4: , where d and q are the dimensions of the 
predictors X and Z, respectively, and h denotes a bandwidth converging to with increasing 
sample size n. 

The present paper is devoted to the problem of constructing a test for the hypothesis of the 
significance of the predictor Z, i.e. A = 0, in the nonparametric quantile regression model, 
which can detect local alternatives converging to the null hypothesis at a parametric rate and 
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at the same time does not depend on the dimension of the predictor Z, such that smoothing 
with respect to the covariate Z can be avoided. To be precise, the test proposed in this paper 
can detect alternatives converging to H at any rate a n — > such that a nA /n — > oo, where n 
denotes the sample size. Our approach is based on an empirical process, which estimates the 
functional 

(1.2) T(x,z) = E[(P(Y<q T (X)\X,Z))-r)I{X<x}I{Z<z}] 

= E[(I{Y < q T (X)} - r)I{X < x}I{Z < z}\ 

for all (x,z) in the support of the distribution of the predictor (X,Z), where the inequality 
X < x between the vectors X and x is understood as the vector of inequalities between the 
corresponding coordinates and I{A} denotes the characteristic function of the event A. The 
model, necessary notation and definition of this process are introduced in Section [2] and a 
stochastic expansion of the process T n (x,z) is established in Section [3j This result allows us 
to obtain the weak convergence of an appropriately scaled and centered version of T n (x,z) 
under the null hypothesis, fixed and local alternatives. As a result we obtain a Kolmogorov- 
Smirnov or a Cramer von Mises type statistic for the hypothesis of the significance of the 
predictor Z in the nonparametric quantile regression model. Moreover, we are also able to 
extend the result to the case, where the dimension q of the predictor Z is growing with the 
sample size, that is q = q n — > oo as n — > oo. The finite sample properties of a corresponding 
bootstrap test are investigated in Section HI As a by-product of our theoretical analysis we also 
obt ain new results on th e unifo rm convergence of the conditional quantile estimator proposed 



by iDette and Volgushev! (120081 ) . Finally all proofs, which are complicated, are deferred to an 



Appendix in Section lAl 



2 Model, assumptions and test statistic 

Let Y, X and Z denote one-, d and q dimensional random variables, respectively, where Y 
corresponds to the response and X and Z are the covariates. We assume that the random 
variables {(Y^Xj, Z i )} i= i ! ... >n are independent identically distributed with the same distribution 
as (Y,X, Z). Let r e (0, 1) be fixed. Our aim is to test whether the predictor Z has influence 
on the conditional r-quantile of Y, given (X,Z), or whether the variable Z can be omitted. 
Note that this problem fundamentally differs from the question whether Y is independent of Z 
given X. In fact, the latter is equivalent to testing whether all quantile curves do not depend 
on Z as opposed to looking at a particular quantile. Thus for fixed r G (0, 1) we formulate the 
null hypothesis as 

(2.1) H : E[I{Y < q T (X)} -r\X,Z] = P(Y < q T (X) \X,Z)-t = Q a.s., 
where q T (X) is defined as the conditional r-quantile of Y, given X, that is 

(2.2) P{Y <q T {X)\X)=r. 
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It is easy to see that the null hypothesis (12. ip is equivalent to 



T(x,z) 







for all (x, z) in the support of the random variable (X, Z), where the functional T is defined in 
(II. 2p . This functional can be be estimated by the stochastic process 



(2.3) 



1 - 

T n (x, z) = -J2 (I{Yi < q T (X t )} - r)I{X t < x}I{Z l < z}, 



where (x, z) G Rx X Rz, Rx and Rz denote the support of the distributions of the random 
variables X and Z, respectively, and q T is an appropriate estimate of the conditional quantile of Y 
given X, which will be specified below. A test for the hypothesis of significance of the variable Z 
for the r's quantile curve of Y can now easily be obtained by considering a Kolmogorov-Smirnov 
or Cramer von Mises type statistic based on T n and rejecting the null hypothesis for large values 
of this statistic. Throughout this paper we assume that the sets Rx and Rz are compact. 
In the lit erature, severa l non- param et ric quantile reg r ession estimators have been propo sed 



Bondell et al. 



|see e.g. 



Yu 



and .Toned ( 119971 . Il998l ). iTakeuchi et all ( 120061 ). IChernozhukov et all ( 120101 ) or 



(2010) amon g others]. In this paper we will use an approach proposed by 



Dette and Volgushevl (120081 ) who constructed non-crossing estimates of quantile curves using 
a simultaneous inversion and isotonization of a preliminary estimator of the conditional distri- 
bution function Fy\x of Y given X. For this esti mator, say Fy \ x(y \ x; p), we will use a smoothed 
local polynomial estimator of order p, see e.g. Fan and Gijbelsl (119961 ) . Before defining this 
estimator, it is necessary to introduce some notation. 



For (i-dimensional vectors x = (x(l), . . . , x(d)) G 

(x(l) k(1) ,...,x(rf) k(d) 
k(l) + ... + k(d) , 



x 

a(k) 



and k = (k(l), . . . k(d)) G Nq define 
• x(d) 



7r(x) := x(l) ■ x(2) ■ . 
k! :=k(l)!-....k(d)! 



For ci-dimensional vectors x G M. d , k 6 NjJ and a function K : 



define 



K(x) 
K (m) (x) 



K{x{l))---K{x{d)) 



K hn:k (x) := K{x/h n )-K{{x/h n ) 



K^\x{l)).--K^\x{d)) , KW(x) :=K<g{z/h, 



where m = (m(l) . . . ,m(d)) is a (i-dimensional vector with entries from No and is 
the £th derivative of a function K. 

• Define Nj := #{k G No|cr(k) = j} as the number of distinct d-tuples with size j, and 
denote the elements of this set by k l m , kjv m , m 

With these notational con yentions the local poly nomial estimator pY\x{y\x;p) of order p can 
be represented as [see e.g. iFan and Gijbelsl ( 119961 )] 



(2.4) 



F Y \x(y\x;p) := eUX'WX^X'WY, 
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where e% denotes a vector of suitable dimension with first entry one and remaining entries zero, 
the matrices X, W and the vector Y are given by 



(2.5) 



X 

w 

Y 



\l (x-X n ) k ^ . 
-^Diag^K^x - X x 



x-X n )*»^ (x-X„ 



(x - Xtj^p \ 
J 



(x-X n ) k *>' N p 



nh d n 



n 



y-Y 1 



y-Y n 



and Q denotes a smoothed version of the indicator function /{■ < 0}, that is 



(2.6) 



uj(u)du 



for a given kernel uj with support [—1, 1]. Following iDette and Volgushevi (120081 ) we consider a 
strictly increasing distribution function G : R — > (0, 1), a nonnegative kernel k with bandwidth 
b n , and define the functional 



(2.7) 



J n JO 



1 fT 



fF{G~ 1 {u))-v 



dvdu. 



If Fy\x is the estimator of the conditional distribution function defined in ( 12. 4ft . it is intuitively 
clear that HG >K ,,T,b n {^Y\x{'\ x )) is a consistent estimate of H G)KyT)bn (FY\x{.'\%)) ■ If b n — > 0, this 
quantity can be approximated as follows 

H G ,K,r,b n (F Y \ x {-\x)) « I I{F Ylx (y\x)<r}dG(y) 



[ I{F Y \x{G- l {v)\x)<r}dv = GoF-Ut 
Jo 



x 



and as a consequence an estimate of the conditional quantile function q T (x) = F y ^ x (t\x) can be 
defined by 



{21 



q T {x) := G-\H G ^ bn (F Yl x(-\x))). 



Throughout this paper, we will assume that the kernels, the function G and the bandwidth 
parameters used to build the estimator satisfy the following conditions 

(Kl) The kernel K has support [—1, 1] and isp + l>d + 2 times continuously differentiate 
with uniformly bounded derivatives. Additionally the first p + 1 derivatives of K vanish 
at the boundary points —1 and 1. 

(K2) The function uj in (12. 6p is a kernel of order s > d + 1, has support [—1, 1] and is d times 
continuously differentiate. Additionally u has uniformly bounded derivatives that vanish 
at the boundary points —1 and 1. 
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(K3) The kernel k is a symmetric, positive with support [—1, 1] and has one Lipschitz-continuous 
derivative. 



(K4) G : R — > [0, 1] is a strictly increasing distribution function such that G,G 1 are two time 
continuously differentiable 



(K5) d 2 n s + h? n +1 = o(l/Vn) and logn/ '(nh 3 n d+2 ) + log n / {nh^d™- 1 ) 
(K6) ^ = ^ + ^ + ^ = 



o(l) 



Remark 2.1 iDette and Volgushevl ( 120081 ) demonstrate that the choice of the distribution func- 
tion G has a negligible impact on the quality of the resulting estimate provided that an obvious 
centering and standardization is performed. Similarly, the estimate q T (x) is robust with r espect 
to the choice of the bandwidth b n if it is chosen sufficiently small [see IDette et al.l ( 120061 )]. 



Remark 2.2 IDette and Volgushevl (120081 ) only established point- wise weak convergence of their 
estimator. However, for most applications such as the construction of tests on the basis of this 
estimator, uniform results are needed. In the present paper, we provide general inequalities 
for the operator HG, K ,T,b n defined in ( 12.70 . see Lemma IB.4I in the Appendix. In particular, 
these findings allow to describe uniform properties of the quantile estimator q T in terms of the 
properties of the underlying distribution function estimator Fy\x- F° r example, in Theorem 
IA.1I in the appendix we exploit those bounds to derive a uniform Bahadur-type representation 
for the estimate q T defined in (12.80 . 

In the following discussion it turns out to be advantageous to consider a generalization of the 
test statistic T n defined in (12.30 . where the indicator functions I{Xi < x} are replaced by 
indicators of more general sets 0. To be precise let 5 denote a collection of subsets of R d and 
define V n := {x G -Rx|[^ — h n l,x + h n l] C Rx} (here 1 denotes the d-dimensional vector with 
all entries equal to 1), then all theoretical developments will be based on the statistic 

1 n 

(2.9) T n (e,z) = -Y(I{Y i <q T (X i )}-r)I{X i eenV n }I{Z i <z}, QeZ,zeR z . 

The intersection of the sets G H with the set T> n is needed in the theoretical developments 
to exclude "residuals" I{Y{ < g T (Xj)} — r corresponding to predictors close to the boundary of 
Rx- Note that if Uees© has a positive distance to the boundary of Rx, the collection of sets 
H n will equal H whenever h n is sufficiently small. Note also that we use the same symbol T n for 
the processes in ( 12.30 and (12.90 but the meaning is always clear from the context. 
Additionally to its advantages from a theoretical point of view, the consideration of a collection 
of sets that are more general than sets defined by indicators of rectangles will for example allow 
to investigate the problem of testing the significance of the variable Z on a certain subset, say 
T> C Rx, that is 



(2.10) 



Hg : E[I{Y < q T (X)}I{X G V} | X, Z] = P(Y < q T (X) and X&D\X,Z) = t 
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Note that Hg means that the conditional r— quantile of Y given (X, Z) can be represented as 
a function q T (X) for IgDc Rx- In this case a natural choice for the collection H is given by 
S := {{X < t} n £>|t G M d }, but other choices are of course possible as well. 



3 Main asymptotic results 

In this section we investigate the asymptotic properties of the stochastic process defined in 
( 12. 9p . For this purpose we need some additional notation and technical assumptions which are 
collected here for convenience and for later reference. 

Define the 'error' variables as e = Y — q T (X) and £j = Yi — q T (Xi), i — 1, . . . , n. We assume that 

the conditional distribution function F e \x(-\x) of e given X = x has a density, say f E \ x (y\x). 

Note that by definition we have that F e \x{0\X) = P(e < 0|X) = r. In particular, this identity 

continues to hold even if the null hypothesis is violated. Throughout this paper we denote by 

Fz\x,e{ z \ x i e ) the conditional distribution function of Z given (X,e) = (x,e). 

Define V := Ue e ~0, then we assume that the data-generating process satisfies the following 

conditions. 



(Al) The conditional distribution function Fy\x{y\x) is p + 1 times continuously differentiable 
with respect to x, y and all partial derivatives are uniformly bounded on K x Rx- The 
joint density of (X, Y) is uniformly bounded on Rx x R. Moreover, p > max(s, d+ 1). 

(A2) The density fx of the predictor X is d + 1 + rif times continuously differentiable with 
uniformly bounded partial derivatives on R x and rif > d/2. Moreover ini x( z Rx fx(x) > 0. 

(A3) There exist constants a, C\ > such that 

inf fy\x(y\x) > C\. 

(x,y):x£R x ,\y-q T (x)\<a 



(A4) The function (z, x) Fz\x, £ (z\x, 0) is Holder-continuous of order 7 > with respect to z 
and x uniformly in x G T>, i.e. 



(A5) 



\F Z \xMx> °) - Fz\xM& 0) I < C|| (s, x) 
for some finite constant C . 

su Px£V,ym,z£Z \f'e\X,z(y \ x , z)\ < OO. 



Mil 



In conditions (Al) (A4), Rx can be replaced by a set X C Rx provided that T> C X. Finally, 
the following assumptions on the collection of sets H are required. 

(SI) The class of functions J~\ = {it 1 — y I{u G 6}|0 G 2} satisfies N {] {F U e, L 2 {P X )) < Ce~ a 
for any sufficiently small e > and a c onstant C, where Nn denotes the bracketing number 
[see Ivan der Vaart and Wellnerl (119961 )] 
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(S2) sup 0eS P(X t G 9, 3j : [Xi{j) - h n , Xi{j) + h n ] £ 9) = o(l) for h n -> 0. 



Remark 3.1 Conditions (SI) and (S2) are not strong and for example satisfied for the collection 
of rectangles 5 = {{s < X < t}\s, t G M. d } if X has a uniformly bounded density with compact 
su pport. For more details on bracket ing numbers and their properties we refer to the monograph 
of 



van der Vaart and Wellnerl (119961 ) 



The following result gives a stochastic expansion of the process T n (Q,z) under general condi- 
tions, which is crucial for deriving the asymptotic properties of the process T n . In particular, 
observe that this representation continues to hold under the alternative. 



Theorem 3.2 If the assumptions \(K1)^(K6)\ \(A1)\ - \(A5)\ and \(S1)\ \(S2)\ are satisfied, the 
process T n can be represented as 



1 " 

(3.1) T n (9, z) = - Y j {I{e l < 0} - r)/{X 4 G 9 n }(I{Z i < z} - F Z \ X , e {z\Xi, 0)) + o P {n 



X/2\ 



uniformly with respect to z G Rz, 6gH. 

The proof of Theorem 13.21 is complicated and given is given in the Appendix. As an immediate 
consequence, we obtain that under the null hypothesis H the rescaled process y/nT n (Q,z) 
converges weakly to a centered Gaussian process. 



Corollary 3.3 If the assumptions of Theorem \3.S\ and the null hypothesis Hq in \2. II) are 

satisfied, the process \fnT n converges weakly in £°°(S x Rz) to a centered Gaussian process T 
with covariance kernel 



(3.2) 



fc(6i, y, 6 2 , z) = Cov(T(9i, y), T(9 2 , z)) = r(l - r)E I{X G^n 9 2 } 



x E 



I{Z < y} - F z \x, e (y\X, 0) )[I{Z<z}- F zlx , £ (z\X, 0) 



X,£ 



As a consequence of this result we obtain the weak convergence of functionals such as the 
Kolmogorov-Smirnov statistic 

K n = sup sup |T„(9, z)\ 

by an application of the continuous mapping theorem. In general the asymptotic distribution 
of K n depends on certain features of the data generating process and in the following section 
we will discuss bootstrap approximations for this distribution. However, in some special cases 
the situation simplifies substantially. 

Remark 3.4 In the case where the pair (X, e) and the covariate Z are independent it follows 
from (J33) that 



Cov(T(9 1) y), T(9 2 , z)) = r(l - t)P{I{X g 6i n 9 2 })(F z (y A z) - F z (y)F z (z)), 
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where F z is the distribution function of the random variable Z and y A z denotes the vector of 
minima of the corresponding coordinates of y and z. If additionally X, Z are real- valued and 
H = {(— oo,£]|t G K}, the asymptotic covariance in Theorem 13.21 reduces to 

Cov(T((-oo,t],y),T((-oo, S ],2;)) = r(l — r)F x (s A t)(F z (y A z) — F z {y)F z {z)). 

Hence, for univariate independent covariates X and Z with continuous distribution functions Fx 
and Fz, respectively, the Kolmogorov-Smirnov test is asymptotically distribution-free because 
in this case the statistic 

y/n sup \T n (x,z)\ = y/n sup \T n (F^ 1 (s),Fz 1 (t))\ 

xe-Rx.zG-Rz «i*e[0,l] 



converges in distribution to y r(l — r) sup st6 r 0)1 i \B(s,t)\, where B is the Kiefer-Miiller process 
on [0, l] 2 , i.e. a centered Gaussian process with covariance kernel 

Cov(B(s h t 1 ),B(s 2 ,t 2 )) = (s 1 As 2 )(t 1 At 2 -t 1 t 2 ). 

The result obtained in Theorem 13.21 can also be used to derive the asymptotic properties of 
the test statistic under fixed alternatives. More precisely, the following result holds (note that 
under the null hypothesis, the centering term is zero, and thus this result is a generalization of 
Corollary E3I). 

Corollary 3.5 Under the assumptions of Theorem \3.2\ the process 

\/n[T n (Q,z)- / (f y \x,z(1t(u)\u,v) -t)I{v < z}dF x ,z(u : v) 

v JR x ne n JRz v J 

converges weakly to the limiting process T defined in Corollary Iff. 31 

Remark 3.6 A further consequence of Corollary 13.51 is that the statistic T n converges for all 
G H and z G R z in probability to the function 

/ / \Fy\x,z{<It{u)\u,v) - t) ( I{v < z} - F z \ x ,e( z \ u > Q ))fx(u)fz(v)dudv. 
JR x ne JRz v J v J 

Consequently, if H contains sufficiently many sets (for example, if H = {(— oo, x] | x G M. d }), the 
test is consistent. In order to obtain the asymptotic distribution of the test statistic under local 
alternatives of the form 

(3.3) F^ z (q^(u)\u,v) = r + a n h(u,v) 

a result on the asymptotic behavior of T n (Q,z) is required when the data are generated from 
triangular arrays. A closer look at the proofs in the appendix shows that such a result does 
indeed hold under suitable modifications of the conditions in Theorem 13.21 The details are 
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omitted for the sake of brevity. In particular, a test based on the Kolmogorov-Smirnov test 
statistic will detect all local alternatives for which the quantity 



sup 

9n,Z 



Fin XZ (4 n) («) l«, v) - r) I{v < z}dF% (u, « 



diverges to infinity (the superscript is used to indicate that the corresponding quantities depend 
own). For example K n — > oo in probability if H = {(— oo,s] | a; G M d } and F^ x z (qi n \u)\u,v) = 
r + a n h(u, v) for some function /i that is not identically zero on Rx x ife and sequence a n with 
a n \/n — > oo. This means that the test can detect alternatives converging to the null hypothesis 
at rates which are "larger but arbitrarily close" to the parametric rate n~ 1//2 . Moreover, the test 
will have an asymptotically non-trivial power against many local alternatives that converge to 
zero at the exact parametric rate rT l l 2 . 

Remark 3.7 We now give a brief discussion of the properties of the proposed test statistic when 
alternatives of increasing dimension are considered, i.e. when the dimension of the predictor Z, 
say q n , varies with n. Consider the additional assumption 

(Z) The L 2 covering numbers of the classes of functions 

{x h> F z \x, £ {z\x + s,0)\z G Z, \\s\loo < a} 
and {£ h-> I{£ < z}\z G Z} are bounded by Ci(C2/e) kn for some finite constants C\,Ci. 

Note that assumption (Z) holds with k n = q n if for each n the predictor Z given (X, e) has a 
conditional density fz\x,e that satisfies 

SUP \fz\X,e( Z \ X l> Q ) ~ fz\xA Z \ X 2> Q )\ ^ C \\ X 1 ~ X l\\ 
z 

for a finite constant C independent of n. Under assumptions (K1)-(K6), (Al)-(A3), (Z), (A5), 
(SI), (S2) it is possible to prove that 

1 n b 

T "( 9 ' z ) = ~ S( J fe < °) - r ) J {^ e ®n}{I{Zi <z}- F z \xM^ 0)) + op(n^) » 

1=1 

uniformly with respect to z G Rz, © G H. In particular, this result implies 

V^(T n (e,^) - / / (F Y{X)Z (q T {u)\u,v) - r)l{v < z}dF x ,z{ u ^)) = P (k n ). 
v JR x ne n JRz v J / v / 

Consequently, the test is able to detect local alternatives converging to the null hypothesis with 
any rate a n , such that jr\fn — > oo when the sample size and dimension k n of Z is increasing. 



Remark 3.8 

on ideas from 



Jeong et al.l (|2012l) investigated an alternative test for the hypothesis ( 12. II) based 
Fan and Lil dl996h in combination with a modification which was originally pro- 



posed by IZheng Their test is based on the statistic 

1 



Jn 



L((Z, t - Z^/g^Ity < Q(r\Xi)} - r)(I{Y J < Q^X,)} - r) 
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where L is a kernel and g n is a bandwidth converging to with increasing sampling size. These 
authors claimed that a normalized version of this test statistic converges to a normal distribution. 
It should be p o inted out here that the proof in this paper is not correct. The basic argument 
of iJeong et al.l (120121 ) consists in the statement that the fact 



SUp | Q T (x) - Q T (x) \< C n 



results in the estimate 

(3.4) J nU < J„ < J nL , 

where the statistics J n \j and J n L are defined by 



Jnu — —, 7T—j'y^L(.(Z i — Zj)/g)ei U ej U , 

n[n — 1 )q a 



n(n — l)g 



and em = I{Y, L + C n < Q T (Xi)} - r, e iL = I{Yi + C n < Q r (JQ)} - r (see equation (A. 11-3) 
in this paper). A simple calculation shows that this conclusion is not corr ect and in f a ct the 
inequality (13 .4ft does not hold. It turns out that the proof of Theorem 1 in Ijeong et ali 
can not be corrected easily. 



Even if the gap in the proof would be closed, the test of IJeong et al.l (120121 ) still has two major 
drawbacks. First, it requires non-parametric smoothing with respect to the covariate Z. Second, 
it can only detect local alternatives converging to the null hypothesis at a rate n~ 1 / 2 h~ ( > d+q ^ 4: 
which is slower than the rate b n n~ 1 / 2 for any b n — > oo detected by the test proposed in this 
paper and additionally depends on the dimension of the covariates. 



4 Bootstrap and simulation results 



In general the limit distribution derived in Theorem 13 . 2 1 depends on certain features of the data 
generating process which are difficult to estimate. For this reason we discuss in this section 
bootstrap methods that are suitable to mimic the distribution of test statistics based on T n under 
the null hypothesis. To be precise, let P* denote the conditional probability P(- \ y n ), given the 
original sample y n = X i: Zi) \ i = 1, . . . , n}, and denote by E* and Cov* the corresponding 
conditional expectation and covariance. Several residual wild boots trap appro xim ations have 



been p roposed in the literature for quantile regression analysis [see ISunl (120061 ) or iFeng et al 
(120 111 )]. However, the residual wild bootstrap does not yield a valid approximation of the 



limiting distribution in the present context because it does not lead to an expansion of the 
bootstrap process analogous to the one given for T n in Theorem | 



(2001 


) or 


He and Zhu 


(2003) 



where q T denotes an estimator for the conditional r-quantile of Y i} given X { 



■■Yi-q T (Xi 
define f : 
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^2j=i^{^j — ®}/ n and introduce independent identically distributed Bernoulli random vari- 
ables Bi, ... , B n with success probability f , which are independent of the original data. Define 
the bootstrap process as 

1 n 



i=l 

where 



denotes a kernel estimator for the conditional distribution i^|x,e('l a; ) 2/)- Here, L and N de- 
note d- and one-dimensional kernel functions and a and e corresponding bandwidths converg- 
ing to with increasing sample size. For the sake of brevity we do not consider conditional 
weak convergence of the process T* in detail, but note that E*[T*(Q, z)] = and under 
the null hypothesis H (and under suitable regularity conditions) the conditional covariance 
nCov*(T*(©!, y),T*(Q 2 , z)) converges in probability to the covariance Cov(T(Gi, y), T(0 2 , z)) 
as defined in Theorem 13.21 

In our numerical investigations, it turned out that the asymptotic representation (13. ip for the 
process defined in (12. 3p is not very accurate for small sample sizes. We thus considered a slightly 
modified version of this process, that is 

1 - 

f n {x, z) = -J2 ( J {^ < 9r(*i)} " < x}(I{Z t <z}~ F z (z)) 

i=i 

where Fz{z) denotes the empirical distribution function of Zi,...,Z n , which provided much 
better results for moderate sample sizes. As motivation for this approach, observe that under 
both the null hypothesis and the alternative, we have 



1 n 

D x := - V (I{Yi < q T (Xi)} - r)I{X t < x} = o P ( 

ri f * 



(n' 1 / 2 ), f = T + o P (n- 1 ^ 

n 

i=i 



uniformly with respect to x as can be seen by taking a closer look at the proofs of the main 
results in the Appendix. Thus the additional correction term 



_ _ n 

S x , z := D x F z (z) + T -^- I{Xi < x}I{Z l < z) 



n 

i=l 



vanishes asymptotically (uniformly with respect to x, z) under both the alternative and the null 
hypothesis. If, on the other hand, 5 XyZ is relatively large because the sample size is small, the 
correction term 5 X)Z induces an additional centering (the factor Fz(z) corresponds to the amount 
of non-zero indicators I{Zi < z}). 

The simulation results described below confirm that this is a sensible approach. 
For the calculation of the test statistic 

(4.2) K n = sup sup | T n (x, z) \ 
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based on the process T n , we use local polynomial estimators of order two [see ( 12. 4p ]. The 
bandwidth h n of this e stimator is chosen as h n := (<7 2 /2n) 13//50 whe re a 2 denotes t he va riance 
estimate of iRicd (119841 ) from the sample {pQ, Y$)| i = 1, . . . , n} [see lYu and Joned (119971 ) for a 
related approach]. The bandwidths used in ( 12. 51) and (14. 1|) are chosen as d„ — a = e = h n , while 



the choice of b n in (12.71) is even less critical [see also iDette and Volgushevi (120081 )] and we use 
b n = h 3 n . In fact, in the simulations it turned out that the power and size properties of the test 
are rather insensitive with respect to the bandwidth choice, see tableland related discussion in 
the next paragraph. The function cu i n (12.61) is chos e n as u )(x) := (15/32)(3 — 10x 2 + 7x 4 )J{|a;| < 
1}, which is a kernel of order 2 [see iGasser et ajj (119851 )]. The function k in (12.71) is defined 
as Epanechnikov kernel while all other kernels are Gaussian ker nels. For the choic e of the 



distribution function G in (12. 7\i we follow the procedure described in lDette and Volgushevi (120081 ) 
who suggested a normal distribution such that the 5% and 95% quantiles coincide with the 
corresponding empirical quantities of the sample Yi, Y n . 



4.1 Simulation results 

We simulate data from the location scale model 



(4.3) 
j, k = l, 
(4.4) 

and 
(4.5) 



Y% — QjiXi, Zi) + Sk(Xi, Zi)ei, 

, 4 with the following quantile and scale functions 

qi(x,z) = exp(2x 2 ) , q 2 (x,z) = (x — 0.5) 2 
q 3 (x,z) = exp(2x 2 ),2 2 , q 4 (x,z) = sin(27r(x + z)) 



S!{x,z) = 0.5(x + 0.2) , s 2 (x,z) = 0.5(sin(x) + 1.2) 
s 3 (x,z) = 0.5(z + 0.2), s 4 {x, z) = 0.5^/{x + 0.2)(z + 0.2). 



The random variables X and Z are independent and uniformly distributed on the interval [0, 1] 
while e is standard normal. We consider the cases r = 0.5 and r = 0.25. All reported results 
are based on 1000 simulation runs with 300 bootstrap replications. 

The bootstrap test (at level a) rejects the null hypothesis that the variable Z is not significant, 
whenever 



(4.6) 



K n > K n l _ a 



where K n is defined in (" 14 . 2 f) and K* 1 _ a denotes the (1— a) bootstrap quantile of the Kolmogorov- 
Smirnov test statistic. 

The rejection probabilities of this test under the null hypothesis are shown in Table [1] for the 
50% and 25% quantile. Note that different pairs of location and scale functions in (I4.4p and 
( 14. 5 p correspond to the null hypothesis for r = 0.5 and r = 0.25 (more precisely the models 
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a = 


0.025 


a = 


0.05 


a = 


0.1 


T 


(k,l) 


n = 50 


n = 100 


n = 50 


n = 100 


n = 50 


n = 100 




(1,1) 


0.037 


0.035 


0.053 


0.061 


0.102 


0.111 




(1,2) 


0.026 


0.025 


0.044 


0.048 


0.090 


0.101 




(1,3) 


0.041 


0.027 


0.069 


0.066 


0.132 


0.127 


0.5 


(1,4) 


0.040 


0.033 


0.060 


0.059 


0.120 


0.121 




(2,1) 


0.036 


0.031 


0.068 


0.057 


0.122 


0.106 




(2,2) 


0.024 


0.028 


0.051 


0.046 


0.092 


0.085 




(2,3) 


0.037 


0.025 


0.057 


0.059 


0.132 


0.114 




(2,4) 


0.027 


0.024 


0.050 


0.047 


0.109 


0.093 




(1,1) 


0.024 


0.019 


0.044 


0.035 


0.089 


0.082 


0.25 


(1,2) 


0.024 


0.019 


0.044 


0.037 


0.089 


0.092 




(2,1) 


0.027 


0.025 


0.047 


0.052 


0.102 


0.105 




(2,2) 


0.016 


0.022 


0.036 


0.048 


0.089 


0.101 



Table 1: Simulated rejection probabilities of the bootstrap test ( [^.ffi ) for significance of the variable 
z in the quantile regression model . 3\ ) for r = 0.5 (upper part) andr = 0.25 (lower part) under 
various null hypotheses. The pair (k, I) corresponds to the location function q^ and scale function 
se specified in ^4-41 ) ^nd ^4.5$ , respectively. 
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a = 


0.025 


a = 


0.05 


a = 


0.1 


T 


(M) 


n = 50 


n 


= 100 


n = 50 


n = 100 


n = 50 


n = 100 




(3,1) 


0.999 


1.000 


1.000 


1.000 


1.000 


1.000 




(3,2) 


0.756 


0.983 


0.815 


0.989 


0.886 


0.997 




(3,3) 


0.997 


1.000 


0.999 


1.000 


0.999 


1.000 


0.5 


(3,4) 


1.000 


1.000 


1.000 


1.000 


1.000 


1.000 




(4,1) 


0.082 


0.197 


0.142 


0.311 


0.252 


0.519 




(4,2) 


0.034 


0.070 


0.067 


0.119 


0.138 


0.237 




(4,3) 


0.089 


0.176 


0.134 


0.279 


0.226 


0.488 




(4,4) 


0.070 


0.203 


0.123 


0.321 


0.218 


0.508 




(1,3) 


0.099 





240 


0.163 


0.325 


0.245 


0.459 




(1,4) 


0.044 





078 


0.086 


0.133 


0.155 


0.225 




(2,3) 


0.139 





295 


0.204 


0.405 


0.332 


0.540 




(2,4) 


0.06 





089 


0.106 


0.152 


0.176 


0.232 




(3,1) 


0.935 


1 


000 


0.971 


1.000 


0.988 


1.000 


0.25 


(3,2) 


0.464 





857 


0.591 


0.913 


0.725 


0.954 




(3,3) 


0.792 





990 


0.873 


0.996 


0.934 


0.999 




(3,4) 


0.900 


1 


000 


0.948 


1.000 


0.975 


1.000 




(4,1) 


0.027 





054 


0.055 


0.103 


0.111 


0.229 




(4,2) 


0.019 





031 


0.034 


0.061 


0.078 


0.132 




(4,3) 


0.022 





051 


0.043 


0.091 


0.104 


0.176 




(4,4) 


0.021 





054 


0.053 


0.093 


0.104 


0.195 



Table 2: Simulated rejection probabilities of the bootstrap test ( f^.fip for significance of the variable 
z in the quantile regression model ^4.3j) forr = 0.5 (upper part) and r = 0.25 (lower part) under 
various alternatives. The pair (k, I) corresponds to the location function q^ and scale function 
se specified in \4-4\ ) an d \4-5\) > respectively. 
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r 




0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.4 0.45 0.5 


u.o 


(3,2) 


U.Uo/ U.UoO U.Uof U.Uorf U.U4< U.U04 U.UOl U.U40 U.U4/ U.U4o 
0.238 0.301 0.361 0.389 0.388 0.385 0.381 0.389 0.412 0.404 


0.25 


(1,2) 
(3,2) 


0.017 0.031 0.037 0.033 0.031 0.048 0.042 0.049 0.041 0.053 
0.113 0.160 0.210 0.210 0.237 0.250 0.262 0.246 0.262 0.260 



Table 3: Simulated rejection probabilities of the bootstrap test \4-fy for various bandwidths. The 
sample size is n = 50 and the lower and upper part correspond to the 50% and 25% quantile, 
respectively. The pair (k, I) corresponds to the location function and scale function s^ specified 
in \4-4\) an d respectively. 



a 


0.025 0.050 0.100 


Q2 


0.026 0.042 0.096 
0.998 1.000 1.000 



Table 4: Simulated rejection probabilities of the bootstrap test \4-ty f or ^e significance of a two 
dimensional predictor in median regression. The models are defined in ffTJO , the sample size is 
n = 50 and the upper (lower) row corresponds to the null hypothesis (alternative) 

defined by the pairs (1, 3), (1, 4), (2, 3) and (2, 4) correspond to the null hypothesis if r = 0.5 but 
to the alternative if r = 0.25). We observe from Tabled] that the level is usually approximated 
very well. For r = 0.25 there exist some cases where the test is slightly conservative . 
The corresponding results for various alternatives are displayed in Table [2] and we observe a 
reasonable power for most cases. The power for r = 0.25 is always smaller than the power for 
t = 0.5. This corresponds to intuition because the 25%-quantile is more difficult to estimate 
than the median. The power of the test is smaller for alternatives corresponding to the location 
function q±(x, z) = sin(27r(x + z)) if the sample size is n = 100. However, if the the sample 
size is larger, the test also detects the alternatives with reasonable probability. For example if 
n = 200 and r = 0.5 the simulated rejection probabilities of the bootstrap test at level 5% for 
the alternatives (4,2), (4,3) and (4,4) are given by 0.319, 0.795 and 0.821, respectively. 
Next we study the impact of the choice of the bandwidth on size and power of the bootstrap 
test. For this purpose we consider the sample size n = 50 and bandwidths 0.05, 0.10, 0.15, 0.20, 
0.25, 0.30, 0.35, 0.40, 0.45 and 0.50. The results for model (1, 2) and (3, 2) corresponding to the 
null hypothesis and alternative, respectively, are summarized in Table [31 We observe that the 
level and power are rather stable with respect to different choices of the bandwidth. Simulations 
for other scenarios yield similar results and are not shown for the sake of brevity. 
We conclude our numerical study with a brief investigation of a two dimensional predictor, say 
Z = (Zi,Z 2 ). Because the method proposed in this paper does not require smoothing in the 
Z-direction, the results should not be seriously affected, if the dimension of Z is larger. To be 
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precise we consider two different location functions 

(4.7) qi(x, zi, z 2 ) = x , q 2 {x, z 1} z 2 ) = z 2 ■ x + z\ 

and a constant scale function s(x, Z\, z 2 ) = 0.5 in model (14. 3p . Note that gi corresponds to the 
null hypothesis, while q 2 represents an alternative. The results of the bootstrap test for the 
median are listed in Table H] for the sample size n = 50 and we observe in these examples similar 
satisfactory properties as in the one- dimensional setting. 
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A Appendix: Proofs 

Throughout this section, introduce the abbreviation B n := G fl T> n with T> n := {x : [x — h n ,x + 
K] c R x }. 



Lemma A.l If assumptions (Kl) (K6) and (Al) (A3) are satisfied, then 
1 



q T [x) = q T {x) - 



fe\x(0\X) , 



K,(v)A s (q T+vbn (x)\x)dv + o P (n 1/2 ) =: q T)L (x) + o P (n 1/2 ) 



uniformly in x G T> n where A$(x,y) is defined in Lemma \B.l\ and has the property 

'logn\ !/ 2n 

• \A s [q T+vbn {x)\x)\ = U P [d' n +[ 



sup \A s (q T+vbn (x)\x)\ =0 P (d s n + ( 
ve\-i,i],xeT> n v v 



n 



hi 



Moreover, q T ,L{%) is, with probability tending to one, d + 1 times continuously differentiate with 
derivatives bounded uniformly on T> n . 



Proof. Apply part (a) of Lemma lB.41 to Fy\x{'\x) and part (c) of the same Lemma with 
Fi(-\x) = Fy\x('\x), F<2(-\x) = Fy\x{'\%\p)- Combined the results with Lemma lB.ll yields the 
assertion. □ 



Lemma A. 2 If assumptions (Kl) - (K6) , (Al) - (A4), (SI) and (S2) are satisfied, then 



f fe\x{0 I s){q T (s) -q T (s))I{s E e n }fx{s)F z{x>£ {z\s, 0) ds 
1 n 1 

— V (l{ £i < 0} - AliXi e & n }F z \xMXh °) + °p(-j=) 

i=\ v 



uniformly with respect to 9 e S, z G Rz- 
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Proof. ^From Lemma [A. II we obtain the representation 



- J f e \x(0\s)(q T (s) - Qr(s))I{s G e n }f x ( 8 )F z[Xte (z\s,0)ds 

k(v) J A s (q T+vbn (s)\s)I{s e Q n }f x {s)F zlXt£ {z\s,0)dsdv + o P (n~ 1/2 ) 



£"< , >/ffi? M W( n 



dm. 



F Y \x{qr+vb n {s)\Xi) 



x (K ftni o(s - Xi), K hnMNp p (s - Xi 
xl{s e e n }(/ 1 (X i ; 6 n , /i n ) + J 2 (X <; e n , /i„))/x(s)i^| X , e (z| S , 0)dsd« + op^" 1 / 2 ), 



where 



and 



i=0 • /AV y l<|m|<n/ ^ V ; 



h(X;Q n ,h n ) := /{®; =1 [I(j)-/(„,I(j) + yce„}, 

J 2 (X; 9 n , h n ) := I{3j : [X(j) - h n , X(j) + h n ] £ Q n , ®* =1 [X(j) - h n , X(j) + h n ] fl 0„ ^ 0}. 

We will now proceed to show that the first part in the above decomposition [i.e. the part 
containing Ii] determines the asymptotic expansion and establish at the end of the proof that 
the part corresponding to J 2 is asymptotically negligible. First, note that 



qr+vbn(s) - Y l 



FY\x{qr+vb n {s)\Xi) 



x (K hnfi (s - Xi), K hn>kNpip {s - X^Iis e e n }h(Xi] 6 n , h n )f x {s)F zlx>e (z\s, 0)ds 

x fK 1|Q (», K likjVp p (s) \ h(Xi] 6 n , h n )f x (Xi + sh n )F z \ X)£ (z\Xi + sh n , 0)ds. 

Observe that every entry of M is by assumption continuously differentiate with respect to s 
and the derivative is uniformly bounded. The class of functions defined by 



{fay) 



q c (x + a) -y 



\a{j)\<l,j = l,...,d,\(-T\<a} 



where a is a small positive number has covering numbers that satisfy the assumptions of part 1 
of Lemma IB .31 in Appendix [B] This follows from Lemma IB. 21 together with the fact that under 



the assumptions (Al) , (A3) the mapping (£, a) H- q^x + a) satisfies 

sup \q Cl (x + at) - q C2 (x + a 2 )\ < CflCi - C2I + IK - Oalloo) 
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for some finite constant C (this inequality is a consequence of the implicit function theorem). 
Moreover, it follows from the smoothness assumptions on Fy\x an d the properties of Q that 



sup 

N<i,M<i 



E 



n( ^^{x l + sh n ) Y t j _ FYixiqr+vbAXi + shn)lXi) 



X; 



< R n a.s. 



where R n is a nonrandom quantity of order o(l/^/n). Thus the smoothness properties of 
Fz\x,e,Fy\x an d (C) x ) ^ <?e( x ) i m ply that by Lemma TB.2I and Lemma [B.3I in Appendix IB1 
we have 

^M(X + M(^( gr+t,bra(X ^^ n) ~ i; ' ) - + sMX)) 

i 

x(Ki >0 (s), K likjVp p (s))*Ji(X i; e„, h n )f x (Xi + shrjFzpMXi + sh m 0) 
= ^ X)M(Xi) (k 1)0 (s), K likjVjjp (s)) V{X; g ej/xTOiv^ix^o) 

i 

uniformly with respect to |t>| < 1, s G [—1, l] d , 6 6 S and z G Rz- Finally, noting that 
/ g r+t , fcn (Xj + gfe ra ) - Yj \ _ ^/q T+vbn (Xi + s/i n ) - q T (Xi) - e,- 



yields 



sup 



Qf ftWgi+i^) ^ )_ /{ -.< 0} <||n||. x /{|.-,-|</?„} r/..s.. 



where _R n = 0(/i n + b n + d n ) is a non-random quantity This, together with an application of 
Lemma [B .3} shows that 

i^M(X)(K 1 , ( s ),...,K 1 , kjVp ^( s )) t /{X e e4/x(X)F Z | Xi£ (^|X,0) 

Qr+vbn 



n 



d„ 



FY\x(qr+vb n ( X i + sh n )\Xi) 



= \Y, M (^)( J fe ^ °) - ^|x(0|X))(K li0 ( S ), K likjVp »)< 

i 

x/{X G e n }/x(X)F Z | Xi£ (^|X,0) + o P (n~ 1 / 2 ). 
In particular, noting that F £ |x(0|Xj) = r, the above result implies 

J fe\x(0\s)(q T (s)-qAs))I{s G e n }f x {s)F zlXi£ {z\s,0)ds 
= i^M(X)(/{^ < 0} -T)( f i (K),...,^ Np jK)) t I{X l G e n }/x(X)F Z | Xi£ (^|X,0) 

i 

+o P (n-^), 
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where fik(K) := j Rd K-i^ujdu. Now from the definition of M it is easy to see that 

M(x) = e\{M Q {x)- 1 + h n R M (x)) = e\ ( M }^\ + h n R M (x)) 

v Jx{x) / 

where Rm denotes a vector whose entries are uniformly bounded and Lipschitz-continuous with 
respect to x. Thus applying Lemma IB. 31 we obtain 

i^MpQX/fc < 0}-T)( f i (K),..., f i kNpp (K)) t I{X i G Q n }f x {X t )F z \ X M^) 

i 

1 n 

- J2(I{ei < 0} - r)I{Xi G Q n }F zlXi£ (z\X, t , 0) + op^ 



l/2\ 



n . 
i=i 



which completes the first part of the proof. 
It remains to show that 



^UX^K) J\{v) J ±M(s) (^( qT+Vbn{ ^ Yi ) ~ Fy^q^is)^) 

x (K hnt0 (s - Xi), K hn>UNp>p (s - Xi)yi{s G & n }f x (s)F zlx , £ (z\s, 0)dsdv = op^ 1 / 2 ) 

uniformly with respect to © G 5, z G To this end, consider the (n-dependent) class of 
functions J- n with elements 

U nMn (x,y) = J\v) j ^^(^( ^j^^ - Fy lX (qr +vbn (s)\x)) 

x (K hnfi (s - x), K hn)kNpp (s - x)Yl{s G Q n }fx{s)F z \ x , £ {z\s, 0)dsdv 

indexed by z G Z, G S contains uniformly bounded elements (the bound is also uniform with 
respect to n). Moreover, there exists a finite positive constant C such that 

(A.l) N {] (F n ,e,L 2 (P x )) < (iV [] (J- nil ,£/C,L 2 (P x ))iV [] (J- ni2 ,£/C,L 2 (P x ))) 2 , 

where .F^i := {s \-t I{s G n }|@ G S} and J-" ni 2 := {s F z \ Xt£ (z\s,e)\z G Z}. To see that this 
holds, observe the decomposition 

fz,e n ,h n ,b n ( x ,y) — fi,e„,h n ,b n ( x ^y) + fz,e n ,h n ,b n ( x >y) 

:= j^Ylf J K ( V ) I {\\ X _ s \\oo < h n }fx(s)gj, n (x,y,s,v)I{s G Q n }F Z \ Xt£ (z\s,0)dsdv 

where gi jTl and g 2 ,n denote non-positive and non-negative, uniformly bounded functions, respec- 
tively. Moreover, gj >n do not depend on 6„ or z. Obviously, it suffices to bound the bracketing 
number of T^ n := {(x, y) i— )■ /j@ h 6 (a;, y)} for j = 1,2 separately. If we denote by &c/,j]} 
a collection of e— brackets (with respect to L 2 (P X )) for {s i-> J{s G @n}-Fz|x,e(2|s, 0)}. Then a 
collection of e/C brackets for T n 2 (with respect to L 2 (Px,y)) is given by 

B K j{x,y) := ^ y y «(v)/{||a;- s||oo < h n }f x (s)g 2 , n {x,y, s,v)b K j(s)dsdv, K = U,L. 
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To see this, observe that 



^[{B^X^) - B Ud {X x ,Y x )f] 
- J J J dln^^y, 3 ^)^^) 1 ^ - s \\oo < h n }f x (s)K(v)dsdv 

< C\ J fx(s)(bu,j(s) - b LJ (s)) 2 J -^I{\\x - sW^ < h n }f x (x)dxds 
for some finite constant C x . A bound for J-" nj 2 can be derived by similar arguments. Thus (lA.lj) 



is established. Combining the bound in (lA.lj) with the assumptions (SI) and (S2), the estimate 
su P.z,e l^[/2,e„,/i„,6„(Ai, Y x )]\ = o(n -1 / 2 ), and the results from Lemma lR"2l and Lemma [R3l yields 
the assertion after noting that by assumption sup egH EJ 2 (Xj; 9„, h n ) = o(l). □ 

Lemma A. 3 Under the assumptions of Theorem \3. 6 A it holds that 
1 " 

T n (9 n , z) = - ^(Jfo < 0} - r)I{X t G e n }I{Zi <z} + o^ri" 1 / 2 ) 

71 . , 
i=l 

+ J (F £ \ x ,z{qT,L(s) ~ qAs)\s,t) - F £lx>z (0\s,t))I{s G B n }I{t < z}dF x>z (s,t), 

uniformly with respect to O G H, z G Rz, where Fx.z denotes the joint distribution function of 
X,Z. 

Proof. Note that T n (Q, z) = \ ELiCM^ < 0}-r)I{Xi G Q}I{Zi < z}, and that the assertion 
is equivalent to 



sup 



-E 



1 n 

- < 0} - I{Ei < 0})I{Xi G e n }I{Zi < z} 

i=i 

{I{£l < 0} - I{e < 0})I{X G Q n }I{Z < z} (Y h X u Z { ) l=1 _ n 



= Or, 



n 



Here we define £j = — q T (Xj), El = Y — q T)L (X), where we assume that the sample (Y^Xi, Zi), 
i = 1, ... ,n, (used to build q T ,L) is independent from the generic variable (Y, X, Z). The proof 
now proceeds in two steps. First, note that by Lemma I A. II we have q T — q T ^ = op(n -1 / 2 ) 
uniformly on T> n and thus there exists a deterministic sequence 7„ = o(n -1 / 2 ) with 

(A.2) P(sup \q T (x) - g r ,i(x)| < 7n ) ->■ 1. 

Now on the set {|g T (x) — q r ^(x)\ < 7„}, the probability of which tends to one, we have 

1 ™ 1 n 

-Y\Wi < 0}-/Rl < 0})I{Xi G e n }i{z l < z} < - < ln}I{X % G P n } 

71 n z — ^ 
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sup 

ees,ze.2 



8=1 



i=l 



Next, note that /{je^xl < 7 n } = I{\Si — g{Xj)\ < 7„} for g = q T ^ L — q T . Now the assertion 
follows since the (n-dependent) class of functions 

{(e,0 ^ /{|e - g(0\ < 7n}/{£ G V n } | g G Cf^)} 



satisfies the assu mptions of part 1 of Lemma |B.3I whe never n is sufficiently large, see the proof 
of Lemma A. 3 in iNeumeyer and Van Keilegom 72010h for a similar reasoning, and q r ^ — q T G 
Cf +1 {V n ) with probability converging to one by Lemma [A. II Here Cf +1 (T> n ) is the class of d+1 
times differentiable functions g defined on T> n . Further, note that 



sup E 

geCf +1 (X>„) 



ol n 



I{\e t -g(X t )\< ln }I{X t eV n } 
This, together with flA.2j) . and an application of Lemma lB.3j shows that 



1/21 



sup 



1 n 

1=1 



op(n-^). 



Similar arguments applied to the (n-dependent) class functions 

{(e, £, C) ^ (/{c < g(0} - He < 0})/{£ G 6 n }/{C < *} | g G Cf^ifc), 6 G ~, z G z} 



yield 



sup 



1 n 

- yZ(i{e itL < 0} - /{e, < 0})I{X t G 9 n }/{Z 4 < z} 
n 



— £■ 



(/{ex < 0} - J{e < 0})/{X G Q n }I{Z < z} (Yi, X h ZJ, i = 1, . . . , n 



°p(-?=)- 



n 



and thus the proof is complete. 



□ 



Proof of Theorem 13.21 Starting from the stochastic expansion given in Lemma [A. 31 we obtain 
by Taylor's expansion 

1 " 

T n (9 n , z) = - Y7l{e< < 0} - r)/{X, G e n }/{z, < z} 
i=i 

+ y / E [x,z(0|a,t)(g T (s) - g T (s))/{s G 6 n }/{t < z}dF x<z (s, t) 

+ J fe\x,z(^,n\s,t)(q T (s) - q T (s)) 2 I{s G B n }I{t < z}dF X! z(s,t) + o p (-j=) 

for some ^ )SiTl between and g r (s) —q T (s) where the last line is of order o p (n -1 / 2 ) due to Lemma 
lA.ll and the assumptions snp xeV yeRz£Rz \fL x z(v\ x ^ z )\ < °°' + \ogn/nh d n = o(rr l l 2 ). Note 
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that 



J fe\x,z{°\ s MQr{s) - q T (s))I{s G Q n }I{t < z}dF x<z (s,t) 

= J F zlXjE (z\s,0)f El x(0\s)f x (s)(q T (s)-q T (s))I{s G Q n }ds. 

By Lemma IA.2I we thus have 

1 n i 
T n (Q n ,z) = -Y / (I{e i < 0}-r)I{X l G 9„}(7{^ < z} - F zlx , £ (z\X t , 0)) + o p {-=). 

i=l v 

This completes the proof. 



□ 



Proof of Corollary 13.31 and 13.51 Define the sequence of n-dependent classes of functions 

?n ■= {(e,£,C) ^ el{i G 6 n V n }(I{( <z}- F zlXiE (z\Z,0)) | 9 G E,z G R z ) 

and note that it is indexed by the totally bounded metric space (H x R z ,p) with metric 

p((9 1; y), (9 2 , z)) := (E[(We 1>w - We 2 , z f]) 1/2 

where We,* := (J{ei < 0} - r)I{X 1 G <d}(I{Z 1 < z] - F z \ x ^{z\X x ^)). Moreover, it satisfies 
the assumptions of part 2 of Lemma IB.3I A simple calculation in combination with the as- 
sumption sup ft e^P(X, ; G 9\9 n ) = o(l) shows that all the assumptions of Theorem 2.11.23 in 



van der Vaart and Wellnerl ( 119961 ) are satisfied. In particular, the covariances Cov(We nj2/ , WQ> nt 



converge to k(Q, y, 9', z) given in Corollary 13.31 This implies that the process 
\/n( T n (9 n , z) - T n (9 n , z) 



= - J2 ((I{ei < 0} - r)I{X t G 9 n }(/{Z J < z} - F m M^ 0)) ~ f n (B n , z)) + o p {-=). 
i=i v 

where f n (G n ,z) := e[(J{^ < 0} - r)/{X, G 9j(I{Z i < 2} - F^^Xi, 0))] converges 
weakly to the centered Gaussian process T(9 n , z) described in Corollary 13.31 Thus Corollary 
13.31 and 13.51 follow after a straightforward calculation of the expectation T n (Q n ,z). Now the 
proof is complete. □ 



B Technical results 

Before stating the main results of this section, we discuss some basic properties of the local 
polynomial estimator F Y \ x (y\x;p). To this end, we note that 

"iTurv i\r t„ „.\ t/ i ™ „.\ \r 1 „ „.\\t 



X'WY = (V nfi (x, y), K,k M (x, y), 2/))* 



with 



/in' " 
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Lemma B.l Under the assumptions (Kl), (K2), (K5), (Al), (A2) it holds that 
FY\x(y\x;p) - F Y \x(y\x) 



j=0 

+o P (n 



l<\m\<n f 



-l/2\ 



, A s (y\x) + o P (n-^) = P (d s n + ( ^) ^ 



uniformly with respect to (x, y) G T> n x where y is any bounded subset of K and denote 
some matrices with uniformly bounded entries that are independent ofx,n,y and 



»)^E K ^-*)(«(^)-w. 



Moreover, the quantity As(y\x) is, with probability tending to one, d + 1 tzmes continuously 
differentiable with respect to x and y and all its partial derivatives of corresponding orders are 
uniformly bounded on V n x y. 

Proof. At the end of the proof, we will establish the following two representations 
(B.$ Ylx (y\x;p) = F Y]x (y\x) + e^X'WX)- 1 ^^^, 2/), KT n , k s {x, y)) 1 + P (^ +1 ), 



(X i WX)- 1 = H- 1 (^]( 



j'=o ^ v 7 1<HI<»/ 



X 



iM(K)- 1 
fx(x) 



+ l N >cNOp(h n n f ))H- 1 



(B.2) 

where Mq, M kNn n ^ denote some matrices that do not depend on n,x, Mq = Ai(K) is 

invertible, H is a diagonal matrix with entries 1, h n , .., h n , h 2 w h\, h p w h p n and the term 
hb^ appears times in this vector. By definition we have 



d r y d™T ntkiS (x,y) = (/ 
nh n 

and tedious but straightforward calculations including integration-by parts and substitutions 
yield the estimates 



sup E[d r y d™T n , KS (x,y)} = 0{d 

{x,y)£V„x 



s— r\ 
n )i 



sup n(d;d™T n ^y)) 2 } = o y d+2|m| 



1 



A combination of parts 1,2 and 6 of Lemma [B.2I shows that, for every n, the class of functions 



T n = \ (u, v) ^ K^ k (x -u)(-^u 



(r-DfV- V \ _ F W 



dr. 



x e R x ,y e 
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satisfies the assumptions of part 2 of Lemma IB. 31 with constants not depending on n, which, 
together with the above estimates gives 

log n \ 1 / 2 



(B.3) 



sup \d r v d™T nXS (x,y)\ = Op 



- ri ^ wrJ!) /l ~x- 1 d +2[m[ ,0V(2r-l) 



o(d s - r ). 



Combining (jBljl . flB~2|) and (ESJ) yields 



HE 



;|l|<n/ 



and thus the proof of the first part of the Lemma is complete. 

For a proof of the differentiability results, note that the d + 1— fold differentiability of the 
product of every entry of a scalar product between two vectors follows from the d + 1— fold 
differentiability of every entry of both vectors. This establishes that As(y\x) is d + 1 times 
continuously differentiable with respect to both components and that all partial derivatives are 
uniformly bounded. By the results in (IB. 31) the proof is thus complete once we establish (IB. II) 
and (lR2j) . 



Proof of W. 1\) A Taylor expansion of Fy\x{y\x) gives 
i K ^k(a: - J>Q)Fy| X (y|x) 



— y 



0<|m|<p 

This fact, combined with 
e*(X*WX 



m! 



-i 



nh d n 

yields the representation 



/{m = 0}, 



Fy\x{v\x) 



e'^X'WX)- 1 
nht 



nh d 



\KY,iKh n ,k Np , p (x - Xi)F Y \ x {y\x) ) 

( KYsiKu^x-X^Fy^Xi) \ 

[K^Kh^Jx - X t )F Ylx (y\X t ) ) 
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once we note that ^ £\ \K hn>kNpiP (x - = P {1) and eUX'WX)- 1 = (O p (1), ..,0 P (V)) 
[see the last part of the proof]. Thus 

F Y \ X {y\x) = F Y{x {y\x) + e^X'WX)- 1 ^^, -> C^k^.sO, 1/))* + P (^ +1 ). 



Proof of $13.0) The elements of the matrix X'WX are of the form 

1 /i |m| 



where m = ni! + m 2 and nix , m 2 denote the fc'th and I'th entry in the tuple of vectors 
(0, k^i, kjVi.i, ki t 2, k^j,), respectively. In particular, d+1 + n/-fold continuous differ- 
entiability of /x implies that 

^E K ^-^)= £ ^Hi|(K)^ l ^ ) (x)+Op((^) 1/2 + C). 

n i |l|<n/ n 

Thus we obtain a representation of the form 

X'WX = H( £ hl n lM yfx(x) + 1tvxtvOp(C))h 
lll<«/ 

where Mq, M^ Nn n ^ denote some matrices that do not depend on n,x, Mq = Ai(K) is 
invertible and H is a diagonal matrix with entries 1, h n , .., h n , h\, h\, h v n , h v n where the 
term hn appears N\ k \ times in this vector [see the definition at the beginning of the section]. 
Thus for h n sufficiently small an application of the Neumann series yields (1B.2j) with probability 



tending to one. □ 
Lemma B.2 Bounds on bracketing numbers 

1. Define T + G := {/ + g\f G F,g G G},FG := {fg\f G T,g G Q). Then 

N {] (F + g, e, p)<N u (J", e/2, p)N u (G, e/2, p) 

If additionally the classes J 7 , Q are uniformly bounded by the constant C , we have 

N {] (TG,e, ||.||) < N*(T,e/4C, \\.\\)N 2 (g,e/AC, ||.||) 
for any seminorm \\.\\ with the additional property that \ f 2 \ < I/2I implies \\fi\\ < H/^H- 

2. Let J- n denote a class of functions f x indexed by the bounded interval x G [—A, A] which are 
bounded by a given constant and have support of the form [x — h, x + h] . If sup f eJ r | f(a) — 
f(b)\ < C\a — b\h~ k for some universal constant C we have Nn(J- n ,£, L 2 (P x )) < >ye~ ( - 2k+1 ^ 
provided that P x has a uniformly bounded density. Here 7 denotes a constant which does 
not depend on n. 
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3. Consider the class of functions 

where Q is Lip schitz- continuous and there exist constants Ci, Ci such that Q is constant on 
(—00, C\] and [C2, 00). Assume additionally that the distribution of(X, Y) has a uniformly 
bounded density, then 

N {] (F n ,e,L 2 (P XY )) < C 5 N {] (g,C 6 e 2 , || • |U) 

for some constants C^,Cq independent of n. 

4- For any measure P u,v on the unit interval with uniformly bounded density f , the class of 
functions 

J~ \= \u 1 — y I{u < s}\s e [0, 1]} u {u ^ I{u < s}\s e [0, 1]} 
can be covered by Ce~ 2 brackets of L 2 (P) length e. 

5. For any measure P on R x M. h with uniformly bounded conditional density fv\u the class 
of functions 

g-.= {(u,v)^I{v<f(u)}\feF} 
satisfies N[](Q,e, ||-||p,2) < ^(J 7 , Ce 2 , ||.||oo) f or some constant C independent of e. 

6. Assume that f(x; a) is a function indexed by the parameter a G A such that sup x \\f(s; x) — 
/(t; x) ||oo < C\\s — t\\ 6 for some 9 > and norm || • ||. Then the \\ ■ \^-bracketing 
numbers of the class of functions T = {u \-t f(u;a)\a G A} satisfy N^J 7 , e, \\ \\oo) < 
CiN(A,C2£ 1 / 6 , || ■ ||) for some finite constants Ci,C2- 

Proof. 

Part Q] The first assertion is obvious from the definition of bracketing numbers. For the second 
assertion, note that TQ = (J 7 + C){Q + C) — CT — CQ + C 2 . Moreover, all elements of the 
classes T + C, Q + C are by construction non-negative and thus it also is possible to cover 
them with brackets consisting of non-negative functions and amounts equal to the brackets of 
J 7 , Q, respectively. Finally, observe that if < f\ < f < f u and < g\ < g < g u , we also 
have figi < fg < f u g u . Moreover \\fig t - f u g u \\ < C\\f u - fi\\ + C\\g u - gi\\. Thus the class 
(J 7 + C)(Q + C) can be covered by at most < iV^J 7 , e, \\.\\)N^(Q,s, ||.||) brackets of length 2Ce. 
Finding brackets for the classes CJ 7 , CQ is trivial, and applying the first assertion of the Lemma 
completes the proof. 
Part [2] Consider two cases. 

A) e > Ah 1 / 2 : Divide [0, 1] into iV := 2/e 2 subintervals of length 2a := e 2 with centers ra for 
r = 1, ...,N and call the intervals Ji, 1^. Note that two adjunct intervals overlap by a > 2h. 
This construction ensures that every set of the form [x — h, x + h] with x G [h, 1 — h] is completely 
contained in at least one of the intervals defined above. Then a collection of iV brackets of L 2 - 
length De for some D > independent of h is given by (—CI{u G Ij}, CI{u G Ij}). 
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B) e < Ah 1 / 2 : Observe that by assumption any element g of T satisfies \g(x) — g(y)\ < 
C\x - y\h- k . Consider the points U := i/(N + = 1,...,JV with N := 2 2k+l C / 'e 2k+1 . By 
construction, to every x G [h,l - h] there exists i{x) with \t i{x) - x\ < e 2k+1 /(2 2k+1 C). This 
implies 

\g(x) - g{t l(x) )\ < Ce 2k+1 h- k /2 2k+1 C < e/2 

Then A" ||.||oo— brackets of length covering T are given by (g(ti) — e/2, g(ti) +e/2), i = 1, N. 
From those one can easily construct L 2 (Px)-brackets. 

Part [3] Without loss of generality, assume that Q equals one on [1, oo) and zero on (— oo, — 1]. 
Moreover, the assumptions on Q imply the existence of finite constant Ci,C u such that C\ < 
Q < C u . Distinguish two cases A) e < d n : Starting with e 2 supremum norm brackets for the 
class Q and using the Lipschitz condition yields the desired brackets. B) e > d n : Denote by 
[gi,h 9i,u], [9N(e),h 9N(e),u] brackets for the class Q of || • ||oo-size e. For a function g 6 Q, denote 
the bracket that contains it by \gj(g),i, 9j(g),u]- Observe that 

if V> 9j(g)A x ) + d n 
if V < 9j{g)A X ) ~ d n 

else 

Thus brackets of the form 

bij(x) := I{y < gjj(x) - d n } + Cil{g jti (x) - d n < y < g jtU (x) + d n } 
b u ,j(x) := I{y < gj,i( x ) ~ d n} + C u I{g j} i(x) -d n <y< g jtU (x) + d n } 

contain every function in J-" n . Moreover, the L 2 -length of each such bracket is bounded by 
{C u — Ci){2d n + e) sup fx,Y(x, y) < Ce. This completes the proof. 
Part |4] Follows by standard arguments. 

Part [5] Follows from \I{v < g x {u)} - I{v < g 2 {u)}\ < I{\v - gx(u)\ < 2\\ gi - g 2 \\oc}- 

Part M Obvious □ 

Lemma B.3 (Basic Lemma) Assume that the classes of functions T n consist of uniformly 
bounded functions (by a constant not depending on n). 

1. If for some a < 2 A"[](J r n , e, L 2 (P)) < C exp(— ce~ a ) for every e < 5 n with constants C,c 
not depending on n, then we have 




n 

/e^n,||/| 



SUp ( / fdP n - [ fdP)=0* P (l), 

™,ll/llF,2<<5n V ^ J J 



where the * denotes outer probability, see \van der Vaart and Wellner HM) f 
detailed discussion. 



or a more 



2. If Nn^n, e, L 2 (P)) < Ce a for every e < 5 n , some a > and a constant C not depending 
on n, then we have for any 5 n ~ n~ h with b < 1/2 



SUp ( f fdP n - [ fdP) = 0* P (5 n \ log 

/6^n,||/||i>,2<5„ V ^ J J V 
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Proof. Start by observing that the uniform boundedness of elements of T n by D implies that 
F = D is a measurable envelope function with L 2 -norm D. The proof of the first part follows 
by arguments similar to those used for the proof of the second part and is therefore omitted. 
For the proof of the second part, note that for r\ n sufficiently small 

a(rj n ) := r^D/yfl + log N {] ( Vn D, T n , L 2 {P)) > %/ v l + logC-alog(%) 

Vn\ 

for some finite constant C depending only on a,C,D. Thus the bound in Theorem 2.14.2 in 
van der Vaart, Wellner (1996) yields for 5 n sufficiently small 



E 



sup / fda r 
feT n ■ 



< DJ {] {5 n ,F n) L 2 {P)) + V^J F(u)I{F(u) > ^a(5 n )}P(du) 

< DC, I*" | \oge\de + D^i{d > ££^\ 

JO 1 |l0g<J n | J 

< DC 2 5 n \\og5 n \+DV^l{l>^^}. 

I |logd n |J 

where a n := ^/n(P n —P), P n denotes the empirical measure, and Ci, C 2 are some finite constants. 
Here, the second inequality follows by a straightforward calculation and the first inequality is 
due to the fact that for 8 n sufficiently small by definition 

^l + \ogN [] (sD,T n ,L 2 (P))de<C 1 J \\oge\de. 

Now under the assumption on 5 n , the indicator in the last line will be zero for n large enough 
and thus the proof is complete. □ 

Lemma B.4 Assume that k is a symmetric, uniformly bounded density with support [—1, 1] 
and let b n = o(l). Introduce the notation QG,K,r,b n {F) :— G^ 1 {HG, KtT ,b n {F))- 
(a) If the function F : [0, 1] — >■ R is strictly increasing and F~ Y is k times continuously differ- 
entiable in a neighborhood of the point r, we have 

H id , K , T , bn (F) = F-\r) + £ %F^{t)^ 1+1 {k) + R n (r) 



=i 11 



with \R n {r)\ < C fe («;)fe^sup| s _ r | <6n |(F _1 )W(t) - {F^ v ) ( ~ k \s)\, /^(/c) := / tfn^du and a con- 
stant Ck depending only on k and k. In particular, if F : R — > [0, 1] is strictly increasing and 
F^ 1 is two times continuously differentiate in a neighborhood of r and G : [0, 1] — > R is two 
times continuously differentiate in a neighborhood of F~ 1 {t) with G"(F _1 (r)) > ; we have 

\F~\r)-Q G ^ bn {F)\<R n , 2 :=Cb 2 n sup |(G -1 )'(s)| sup \(G o F' 1 )" (s)\ 

\s-GoF- 1 (r)\<R nil \s-T\<b n 
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for some constant C that depends only on k where R n> i := Cb 2 n sup| s _ T | <6n \{G o F 1 )"(s)|. 

(b) Assume that k is additionally differentiable with Lipschitz- continuous derivative and that 
the functions G, G^ 1 have derivatives that are uniformly bounded on any compact subset of 
R [the bound is allowed to depend on the interval]. Then for any increasing function F with 
uniformly bounded first derivative we have \H(Fi) — H(F 2 )\ < R n ,3 + R n ,i and 

\Q G , K ,T,b n (Fi) - Qg, k ,t,k(Fz)\ < sup |(G- 1 ) / ( M )|(^ n , 3 + R nA ), 

u&A(H(F 1 ),H{F 2 )) 

where the constant C depends only on k, U(a, b) := [a A b, a V b], and 



Cc nur ^ ^ M T7,-l\/f_.\\ D D H-^ 1 — + ll-^i — -^2 1 1 oo 

b r , 



Rn,z:=-r 1 \\F l ~F 2 \\ 00 sup \{G o F~ 1 )' {v)l R nA :=R 



n,3 " 



\v—r\<c n "n 

with c n := b n + 2||Fi - + ||Fi - F^. 

(c) If additionally to the assumptions made in (b ), the function F\ is two times continuously 
differentiable in a neighborhood o/-F _1 (r) with F 1 '(F 1 _1 (r)) > and G is two times continuously 
differentiable in a neighborhood o/F-f^r) with G'(F~ l {r)) > 0, we have 

Q G , K ,T, bn (Fi) - Q G , K , T , bn (F 2 ) = ~ F *i (r)) <v) (F 2 (Ff + vb n )) - F 1 (F l - 1 (r + vb n )))dv 

+Rn, 

where 

, D , ^ D T3 , CbnSup^^GoF-yis^Fi-F^+Rn,* 

\-tin\ S ttn,5 + -Kn,6 H 



mt/i a constant C depending only on k and 

R n . 5 ■= ~ sup |(G? _1 )"(m)|(^(Fi) - #(F 2 )) 2 

^ ue«(tf(Fi),tf(F 2 )) 

i?n, 6 := sup ■ |tf (F) - G(Ff ^(r)! • - fT(F a )|. 

u6M(H(Fi),G(f 1 - 1 )(r)) 

Proof. The proof of the first part of (a) i s essen tially a Taylor expansion. Details can be found 



P 



in the proof of Lemma A. 4 in I Volgushevl (120061 ). For a proof of the second part of (a), observe 



that by definition HG, K ,r,b n (R) = Hid,K,r,b n (F ° G 1 ). Together with the first part we obtain 
{Hu^iFoG-^-GoF- 1 ^ < Cb 2 n sup \{G o F~ 1 )"{s) \ =: R nA 

\s-r\<b n 

which yields 

\G-\H G , K , T , bn (F)) -F-\t)\ < |(G -1 )'(0I " \H iMn (F o G- 1 ) - G(F-\r))\ 

<Cb 2 n sup \(G~ l )'(s)\ sup \{G o F~ 1 )"(s)\ =: R n<2 . 

Is-GoF-^r^KRn,! \s-r\<b„ 
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The proof of (a) is thus complete. 

^^From now on, drop the index of H for the sake of a simpler notation. For a proof of 
(b), observe the decomposition 

H(F 1 )-H(F 2 ) = -1 [\( Fl{G 1(M)) ~ T ) (F 1 (G-\u)) - F 2 {G-\u)))du 

On Jo ^ On ' 

, : , £H-n jf,[g-\u))-t 



J n JO 



x (F 1 (G- 1 ( M ))-F 2 (G- 1 H))^ 
for some |f(u) - F 2 (G _1 (u))| < (F^G- 1 ^)) - F 2 (G -1 (u))|. This yields the bound 



C n JO V n 



du 



x\\F 1 -F 2 \\ 00 

Next, observe that by assumption k is Lipschitz continuous and thus we have the inequality 



< m»)-Fi((r\u))\ (/{ | Fl(G -i (M)) _ T | < 6 „ } + /{ | {(u) _ T | < M) 

On 

< 2L||Fl ~ F2ll °° /{|Fi(G' 1 H) - r| < 6 n + 2||F - Falloo} 

On 

< mFl ~ F2 ^ I{\F{G-\u)) -r\<b n + 2||F - F 2 |U + ||F - FIU}. 



Similarly 



ft 



F 1 (G- 1 H)-rx /F(G~ 1 H)-r 



)-.(: 



< 2L " Fl , F|U /{|F(G' 1 H) - r| < 6 n + ||F - F\U, 

On 



and moreover 



ft 



'F(G-\u))-r^ 



)| <C/{|F(G- 1 H)-r| <&„}■ 



Define c n := 6 n + 2||Fi — F 2 ||oo + ||Fi — FH^. Note that the monotonicity of F, G implies 
{u : \F(G-\u)) -r\< c n } C [G^r - c n )), G(F~ 1 (r + c n ))] 

and 

^(F-V + Cn))-^- 1 ^-^))! <2Cn SU P \(G O F' 1 )' ( V )\. 

\v~r\<c n 
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In particular, this implies the estimate 

/ I{\F(G-\u))-t\ <c n }du<2c n sup KGoF" 1 )'^)!. 

JO \v-t\<c„ 

Summarizing, we have obtained the bound \H(Fi) — H(F 2 )\ < R n ,3 + R n ,4 where C denotes 
some constant depending only on the kernel k. Assertion (b) follows from this estimate and a 
Taylor expansion of G _1 . 

For a proof of assertion (c), note that after a substitution 



J n JO 



K 



F 1 (G- 1 H)-r 



yF 1 (G- 1 (u))-F 2 (G- 1 (u)))du 



where 



J\g o F^)\t + vb n )n{v) (f 2 {F{\t + vb n )) - F 1 (F 1 -\r + vb n )j)di 
(G o Ff 1 )'^) J* k(v) (f 2 {F{\t + vb n )) - F 1 (F 1 ~\t + vb n )j)dv + r n 



\r n \<Cb n sup \(GoF 1 - 1 )"(s)\-\\F 1 -F 2 \ 

\s-r\<b n 



by a Taylor expansion of (G o F 1 A Taylor expansion of G 1 yields 

G-^HiFj) - G-\H{F 2 )) - (G-'nHiF^HiFr) - H{F 2 )) 



< 



sup KG-r^KHiF,) - H(F 2 )Y 



ueU{H(F 1 ),H(F 2 )) 



where U(a, b) := [a A b, a V b]. A Taylor expansion yields 



(G~ 1 )'(H(Fi)) - (G~ 1 ) , (G(F^ 1 )(t)) 



< sup KG-r^i-mF^-GiFr 1 )^) 

ueM(H(Fi),G(F 1 - 1 )(T)) 



and combining this with the results obtained so far we arrive at 



Q(F ± ) - Q(F 2 ) 



1 



k(v) F 2 {F-\t + vb n )) - F^F-^r + vb n )) )dv 



< 



G _1 (if(Fi)) - G _1 (i?(F 2 )) - - (F 2 )) 

+1^) - #(F 2 )| • KG" 1 )'^^)) - (^'(GoFr^r))! 
H(F 1 )-H(F 2 ) , 1 



+ 



k(u) ^(Ff^r + u6 n )) - V + vb n )))dv 



G'(F~\r)) F{(F-\r)) J_ x 

Cb n su P | s _ T |< 6n (G o F _1 )"(s)||F 1 - Falloo + i? n , 4 



< -Rn,5 + Rnfi + 



^(FfV)) 



This completes the proof. 



□ 
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